Terrorism Trends and Determinants

Data Science 1 Project Presentation

Mary Kryslette C. Bunyi

Problem Statement and Background

Motivation

  • Heightened concerns of terrorist activity globally and especially in the Philippines
  • Scarcity of terrorism literature specific to the Philippines
  • Machine learning approach to determining which variables matter most in predicting country-level terrorist activities on an annual basis

Primary Dataset: Global Terrorism Database

In [4]:
# Consolidate
gtdfinal = gtd.append(gtd1993)
gtdfinal
Out[4]:
eventid iyear imonth iday approxdate extended resolution country country_txt region ... addnotes scite1 scite2 scite3 dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related
0 197000000001 1970 7 2 NaN 0 NaT 58 Dominican Republic 2 ... NaN NaN NaN NaN PGIS 0 0 0 0 NaN
1 197000000002 1970 0 0 NaN 0 NaT 130 Mexico 1 ... NaN NaN NaN NaN PGIS 0 1 1 1 NaN
2 197001000001 1970 1 0 NaN 0 NaT 160 Philippines 5 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
3 197001000002 1970 1 0 NaN 0 NaT 78 Greece 8 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
4 197001000003 1970 1 0 NaN 0 NaT 101 Japan 4 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
743 199312280002 1993 12 28 NaN 0 NaT 159 Peru 3 ... This is one of 2 related attacks (cf. 19931228... “Police Station, Funeral Home are Bombed are B... “Guerillas Hit Hard at Capital of Peru,” The R... NaN CETIS -9 -9 0 -9 NaN
744 199312300001 1993 12 30 NaN 0 NaT 603 United Kingdom 8 ... This is one of 3 related attacks (cf. 19931230... Deric Henderson, “Troops Escape Injury in Land... NaN NaN CETIS 0 0 1 1 NaN
745 199312300002 1993 12 30 NaN 0 NaT 603 United Kingdom 8 ... This is one of 3 related attacks (cf. 19931230... Deric Henderson, “Troops Escape Injury in Land... NaN NaN CETIS 0 0 1 1 NaN
746 199312300003 1993 12 30 NaN 0 NaT 603 United Kingdom 8 ... This is one of 3 related attacks (cf. 19931230... “Police Think IRA To Blame for Wave of Belfast... NaN NaN CETIS 0 0 1 1 NaN
747 199312300004 1993 12 30 NaN 0 NaT 183 South Africa 11 ... NaN “De Klerk condemns "barbaric" attack on tavern... NaN NaN CETIS -9 -9 0 -9 NaN

192212 rows × 135 columns

Socioeconomic, Political, and Geographical Variables

  • World Bank DataBank - Various socioeconomic variables
  • Harvard Dataverse - Human Rights Protection Scores
  • The Correlates of War Project - The World Religion Dataset
  • Our World in Data - Terrain Ruggedness Index and military spending statistics
  • Center for Systemic Peace - Polity

Methods/Approaches Considered

  • Data wrangling: collapsing incident data from the Global Terrorism Database into annual data and then matching it against macro data sources
  • Tools for data wrangling -- NumPy, pandas, and country-converter
  • Tools for data visualization -- Matplotlib, plotnine, and seaborn
  • Tools for predictive modeling -- scikit-learn
  • Tool for mapping -- geopandas

Methods/Tools to Date

  • Mostly data wrangling and data visualization -- pandas; matplotlib, seaborn, plotly; World Bank API
  • Issues encountered
    • country converter package
    • missingness in World Bank variables
In [338]:
# Assess missingness
miss.matrix(wbdata_conso)
Out[338]:
<matplotlib.axes._subplots.AxesSubplot at 0x1d5727b5608>

Preliminary Results

In [13]:
g=df0.plot(
    kind='area',
    figsize=(12,8),
    cmap='YlGnBu',
    title="No. of Terrorist Incidents"
)

g.legend(title="Region")
axes2=g.twinx()
p = axes2.plot(df0ph, c='red', label='Philippines',linewidth=2.5)
g = plt.figtext(0.14, 0.02, "Source: Global Terrorism Database")
g = plt.figtext(0.47, 0.4, "Philippines (rhs)",size="x-large",c="red")
plt.savefig("chart0.png",dpi=300,bbox_inches = "tight")
In [17]:
g=df1.plot(
    kind='area',
    figsize=(12,8),
    cmap='YlGnBu',
    title="No. of Civilian Casualties of Terrorist Incidents"
)

g.legend(title="Region")
axes2=g.twinx()
p = axes2.plot(df2, c='red', label='Philippines',linewidth=2.5)
g = plt.figtext(0.14, 0.02, "Source: Global Terrorism Database")
g = plt.figtext(0.49, 0.6, "Philippines (rhs)",size="x-large",c="red")
plt.savefig("chart1.png",dpi=300,bbox_inches = "tight")
In [31]:
fig = px.treemap(df7, path=['World', "Region","Country"], values='Terrorism Incidents',
                  color="Number of Fatalities",
                  color_continuous_scale='thermal')
fig.update_layout(
    title={
        'text': "Terrorism Incidents in the World (1970-2018)",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()
In [325]:
g = sns.FacetGrid(df_attack_conso, col="Attack Type", col_wrap=3, height=5, hue="Location",margin_titles=True,sharey=False,sharex=False)
g.map(sns.lineplot, "Year", "Count")
g.add_legend(loc="upper right")
g.fig.suptitle("Types of Terrorism Attacks through Time",y=1.02)
Out[325]:
Text(0.5, 1.02, 'Types of Terrorism Attacks through Time')

Lessons Learned and Plans to Mitigate Challenges

  • Things are not always as they appear
  • Trade-off between comprehensive set of variables and data availability

  • To mitigate challenges:

    • weigh between importance of including a variable vis-a-vis keeping additional data points in model
    • where possible, impute and check for robustness
In [ ]:
gtd1993 = pd.read_excel("Data/00 Global Terrorism Database/gtd1993_0919dist.xlsx")
In [10]:
# Add column for civilian casualties
nkill = np.where(np.isnan(gtdfinal["nkill"]),0,gtdfinal["nkill"])
nkillter = np.where(np.isnan(gtdfinal["nkillter"]),0,gtdfinal["nkillter"])
gtdfinal["nkillciv"] = nkill-nkillter
gtdfinal["nkillciv"] = gtdfinal["nkillciv"].clip(lower=0)

# Add column for civilian injuries
nwound = np.where(np.isnan(gtdfinal["nwound"]),0,gtdfinal["nwound"])
nwoundte = np.where(np.isnan(gtdfinal["nwoundte"]),0,gtdfinal["nwoundte"])
gtdfinal["nwoundciv"] = nwound-nwoundte
gtdfinal["nwoundciv"] = gtdfinal["nwoundciv"].clip(lower=0)
In [11]:
sns.set_style("white")
In [12]:
# Incidents through time
df0=gtdfinal.groupby(["iyear","region_txt"])[["nkillciv"]].count().unstack()
df0=df0.droplevel(0,axis=1)
df0.index=df0.index.rename("Year")
df0ph=gtdfinal[gtdfinal["country_txt"]=="Philippines"].groupby("iyear")[["nkillciv"]].count()
In [15]:
# Casualties through time
df1=gtdfinal.groupby(["iyear","region_txt"])[["nkillciv"]].sum().unstack()
df1=df1.droplevel(0,axis=1)
df1.index=df1.index.rename("Year")
In [16]:
df2=gtdfinal[gtdfinal["country_txt"]=="Philippines"].groupby("iyear")[["nkillciv"]].sum()
In [18]:
df3=gtdfinal.groupby(["iyear","region_txt"])[["nwoundciv"]].sum().unstack()
df3=df3.droplevel(0,axis=1)
df3.index=df3.index.rename("Year")
In [19]:
df4=gtdfinal[gtdfinal["country_txt"]=="Philippines"].groupby("iyear")[["nwoundciv"]].sum()
In [27]:
df5=pd.crosstab(gtdfinal.country_txt,1).reset_index()
In [28]:
df6=gtdfinal.groupby(["country_txt","region_txt"])[["nkillciv"]].sum().reset_index()
In [240]:
# Get figures by weapon used in terrorism incident
df_weapon1=gtdfinal.groupby(["iyear","region_txt","weaptype1_txt"])[["eventid"]].count().reset_index()
df_weapon2=gtdfinal.groupby(["iyear","region_txt","weaptype2_txt"])[["eventid"]].count().reset_index()
df_weapon3=gtdfinal.groupby(["iyear","region_txt","weaptype3_txt"])[["eventid"]].count().reset_index()
df_weapon4=gtdfinal.groupby(["iyear","region_txt","weaptype4_txt"])[["eventid"]].count().reset_index()

# Make column names consistent
colnames=["Year","Location","Weapon Type","Count"]
df_weapon1.columns = colnames
df_weapon2.columns = colnames
df_weapon3.columns = colnames
df_weapon4.columns = colnames

# Consolidate
df_weapon_conso=pd.concat([df_weapon1,df_weapon2,df_weapon3,df_weapon4])

# Aggregate figures
df_weapon_conso=df_weapon_conso.groupby(["Year","Weapon Type"])[["Count"]].sum().reset_index()
df_weapon_conso["Location"] = "World"
In [242]:
# Get equivalent figures for the Philippines
df_weapon_ph1=gtdfinal[gtdfinal["country_txt"] == "Philippines"].groupby(["iyear","country_txt","weaptype1_txt"])[["eventid"]].count().reset_index()
df_weapon_ph2=gtdfinal[gtdfinal["country_txt"] == "Philippines"].groupby(["iyear","country_txt","weaptype2_txt"])[["eventid"]].count().reset_index()
df_weapon_ph3=gtdfinal[gtdfinal["country_txt"] == "Philippines"].groupby(["iyear","country_txt","weaptype3_txt"])[["eventid"]].count().reset_index()
df_weapon_ph4=gtdfinal[gtdfinal["country_txt"] == "Philippines"].groupby(["iyear","country_txt","weaptype4_txt"])[["eventid"]].count().reset_index()


# Make column names consistent
colnames=["Year","Location","Weapon Type","Count"]
df_weapon_ph1.columns = colnames
df_weapon_ph2.columns = colnames
df_weapon_ph3.columns = colnames
df_weapon_ph4.columns = colnames

# Consolidate
df_weapon_ph_conso=pd.concat([df_weapon_ph1,df_weapon_ph2,df_weapon_ph3,df_weapon_ph4])

# Aggregate figures
df_weapon_ph_conso=df_weapon_ph_conso.groupby(["Year","Location","Weapon Type"])[["Count"]].sum().reset_index()
In [243]:
df_weapon_conso=df_weapon_conso.append(df_weapon_ph_conso)
In [301]:
df_weapon_conso["Weapon Type"]=df_weapon_conso["Weapon Type"].replace('Vehicle (not to include vehicle-borne explosives, i.e., car or truck bombs)','Vehicle')
In [319]:
# Get figures by attack type
df_attack1=gtdfinal.groupby(["iyear","attacktype1_txt"])[["eventid"]].count().reset_index()
df_attack2=gtdfinal.groupby(["iyear","attacktype2_txt"])[["eventid"]].count().reset_index()
df_attack3=gtdfinal.groupby(["iyear","attacktype3_txt"])[["eventid"]].count().reset_index()

# Make column names consistent
colnames=["Year","Attack Type","Count"]
df_attack1.columns = colnames
df_attack2.columns = colnames
df_attack3.columns = colnames

# Consolidate
df_attack_conso=pd.concat([df_attack1,df_attack2,df_attack3])

# Aggregate figures
df_attack_conso=df_attack_conso.groupby(["Year","Attack Type"])[["Count"]].sum().reset_index()
df_attack_conso["Location"] = "World"
In [321]:
# Get equivalent figures for the Philippines
df_attack_ph1=gtdfinal[gtdfinal["country_txt"] == "Philippines"].groupby(["iyear","country_txt","attacktype1_txt"])[["eventid"]].count().reset_index()
df_attack_ph2=gtdfinal[gtdfinal["country_txt"] == "Philippines"].groupby(["iyear","country_txt","attacktype2_txt"])[["eventid"]].count().reset_index()
df_attack_ph3=gtdfinal[gtdfinal["country_txt"] == "Philippines"].groupby(["iyear","country_txt","attacktype3_txt"])[["eventid"]].count().reset_index()

# Make column names consistent
colnames=["Year","Location","Attack Type","Count"]
df_attack_ph1.columns = colnames
df_attack_ph2.columns = colnames
df_attack_ph3.columns = colnames

# Consolidate
df_attack_ph_conso=pd.concat([df_attack_ph1,df_attack_ph2,df_attack_ph3])

# Aggregate figures
df_attack_ph_conso=df_attack_ph_conso.groupby(["Year","Location","Attack Type"])[["Count"]].sum().reset_index()
In [322]:
df_attack_conso=df_attack_conso.append(df_attack_ph_conso)
In [326]:
# 02 Harvard Dataverse--Human Rights Protection
humanrights = pd.read_csv("Data/02 Harvard Dataverse--Human Rights Protection/HumanRightsProtectionScores_v3.01.csv")
In [327]:
# 03 Correlates of War Project--World Religion
religion = pd.read_csv("Data/03 Correlates of War Project--World Religion/WRP_national.csv")
COWcountrycodes = pd.read_csv("Data/03 Correlates of War Project--World Religion/COW country codes.csv")
In [328]:
# 04 Our World in Data
milexp_gdp = pd.read_csv("Data/04 Our World in Data/military-expenditure-as-share-of-gdp.csv")
milexp_percap = pd.read_csv("Data/04 Our World in Data/military-expenditure-per-capita.csv")
milpers = pd.read_csv("Data/04 Our World in Data/military-personnel.csv")
terrain = pd.read_csv("Data/04 Our World in Data/terrain-ruggedness-index.csv")
In [329]:
# 05 Center for Systemic Peace_Polity
polity = pd.read_excel("Data/05 Center for Systemic Peace_Polity/p5v2018.xls")
In [ ]:
wbdata.search_indicators("gdp growth")
In [ ]:
wbdata.search_indicators("gdp growth")[0]
In [ ]:
 
In [335]:
wbdata_period = datetime.datetime(1970, 1, 1), datetime.datetime(2018, 1, 1)
In [336]:
wbdata_conso = wbdata.get_dataframe({"NY.GDP.PCAP.KD":"GDP per capita (constant 2010 US$)",
                                     "NY.GDP.MKTP.KD.ZG":"GDP growth (annual %)",
                                     "FP.CPI.TOTL.ZG":"Inflation, consumer prices (annual %)",
                                     "SP.POP.TOTL":"Population, total",
                                     "SI.POV.DDAY":"Poverty headcount ratio at $1.90 a day (2011 PPP) (% of population)",
                                     "SI.POV.GINI":"Inequality (Gini index)",
                                    "NY.GDP.PETR.RT.ZS":"Oil rents (% of GDP)",
                                    "SE.ADT.LITR.ZS":"Literacy rate, adult total (% of people ages 15 and above)",
                                    "SL.UEM.TOTL.NE.ZS":"Unemployment, total (% of total labor force) (national estimate)",
                                    "GC.NFN.TOTL.GD.ZS":"Net investment in nonfinancial assets (% of GDP)",
                                    "BX.TRF.PWKR.DT.GD.ZS":"Personal remittances, received (% of GDP)"},
                              data_date=wbdata_period,
                              freq="Y")
In [339]:
countries = wbdata.get_country()
In [ ]:
# Attempt to standardize GTD country names
cc = coco.CountryConverter()
country_raw = list(pd.unique(gtdfinal["country_txt"]))
country = cc.convert(names=country_raw, to='name_short')
country
In [ ]:
gtdfinal[gtdfinal["country_txt"]=="International"]
In [ ]:
gtdcountries_old=["East Germany (GDR)",
                  "West Germany (FRG)",
                  "South Vietnam",
                  "People's Republic of the Congo",
                  "International"]
In [ ]:
gtdcountries_new=["Germany",
                  "Germany",
                  "Vietnam",
                  "Republic of the Congo",
                  np.nan,
                  "Serbia-Montenegro"]